100 research outputs found
Neural Semantic Parsing over Multiple Knowledge-bases
A fundamental challenge in developing semantic parsers is the paucity of
strong supervision in the form of language utterances annotated with logical
form. In this paper, we propose to exploit structural regularities in language
in different domains, and train semantic parsers over multiple knowledge-bases
(KBs), while sharing information across datasets. We find that we can
substantially improve parsing accuracy by training a single
sequence-to-sequence model over multiple KBs, when providing an encoding of the
domain at decoding time. Our model achieves state-of-the-art performance on the
Overnight dataset (containing eight domains), improves performance over a
single KB baseline from 75.6% to 79.6%, while obtaining a 7x reduction in the
number of model parameters.Comment: Accepted to ACL 201
Differentiable Scene Graphs
Reasoning about complex visual scenes involves perception of entities and
their relations. Scene graphs provide a natural representation for reasoning
tasks, by assigning labels to both entities (nodes) and relations (edges).
Unfortunately, reasoning systems based on SGs are typically trained in a
two-step procedure: First, training a model to predict SGs from images; Then, a
separate model is created to reason based on predicted SGs. In many domains, it
is preferable to train systems jointly in an end-to-end manner, but SGs are not
commonly used as intermediate components in visual reasoning systems because
being discrete and sparse, scene-graph representations are non-differentiable
and difficult to optimize. Here we propose Differentiable Scene Graphs (DSGs),
an image representation that is amenable to differentiable end-to-end
optimization, and requires supervision only from the downstream tasks. DSGs
provide a dense representation for all regions and pairs of regions, and do not
spend modelling capacity on areas of the images that do not contain objects or
relations of interest. We evaluate our model on the challenging task of
identifying referring relationships (RR) in three benchmark datasets, Visual
Genome, VRD and CLEVR. We describe a multi-task objective, and train in an
end-to-end manner supervised by the downstream RR task. Using DSGs as an
intermediate representation leads to new state-of-the-art performance.Comment: Winter Conference on Applications of Computer Vision (WACV), 202
- …